Distributed packet deduplication

ABSTRACT

Introduced here are network visibility appliances capable of implementing a distributed deduplication scheme by routing traffic amongst multiple instances of a deduplication program. Data traffic can be forwarded to a pool of multiple network visibility appliances that collectively ensure no duplicate copies of data packets exist in the data traffic. The network visibility appliances can route the traffic to different instances of the deduplication program so that duplicate copies of a data packet are guaranteed to arrive at the same instance of the deduplication program, regardless of which network visibility appliance(s) initially received the duplicate copies of the data packet.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.16/001,721, filed Jun. 6, 2018, and titled “Distributed PacketDeduplication,” which is incorporated by reference herein in itsentirety.

TECHNICAL FIELD

At least one embodiment of the present disclosure pertains to techniquesfor eliminating duplicate copies of data packets included in networktraffic received by multiple network visibility appliances.

BACKGROUND

Data traffic (or simply “traffic”) in a computer network can be analyzedto improve real-time decision making for network operations, securitytechniques, etc. Traffic may be acquired at numerous points by a varietyof devices/applications (collectively referred to as “nodes” in thecomputer network), and then forwarded to a network visibility applianceable to provide extensive visibility of traffic flow. Given thecomplexity and volume of traffic routed through many infrastructures,various kinds of network tools are often used to identify, analyze, orhandle issues plaguing the computer network. These issues can includesecurity threats, bottlenecks, etc. Examples of such network toolsinclude an intrusion detection system (IDS) and an intrusion preventionsystem (IPS).

Network visibility appliances and network tools can operate as in-banddevices (also referred to as “inline devices”) or out-of-band devices.Out-of-band devices operate outside of the path of traffic between anorigination node and a destination node, and thus receive copies of thedata packets that make up the traffic rather than the original datapackets. Out-of-band devices can freely modify the copies of the datapackets because the original data packets are allowed to traverse thecomputer network unimpeded. Inline devices, on the other hand, operatewithin the path of traffic between an origination node and a destinationnode, and thus receive the original data packets.

BRIEF DESCRIPTION OF THE DRAWINGS

Various features of the technology will become apparent to those skilledin the art from a study of the Detailed Description in conjunction withthe drawings. Embodiments of the technology are illustrated by way ofexample and not limitation in the drawings, in which like references mayindicate similar elements.

FIG. 1A depicts an example of a network arrangement in which a networkvisibility appliance receives data packets from multipledevices/applications (collectively referred to as “nodes”) in a computernetwork.

FIG. 1B illustrates an example path of a data packet as the data packettravels from an originating device to a recipient device.

FIG. 2 depicts an example of how a visibility platform that includes anetwork visibility appliance can be integrated into a cloud computingplatform to provide a coherent view of virtualized traffic in motionacross the public cloud infrastructure for an end user.

FIG. 3 depicts one embodiment of a visibility platform that can be runentirely within a cloud environment or a non-cloud environment (e.g., asa virtual machine).

FIG. 4 illustrates how separate instances of a deduplication program canbe configured to monitor traffic associated with multiple virtualmachines.

FIG. 5 depicts an example of a network visibility appliance thatincludes a deduplication program capable of filtering duplicate copiesof data packets from traffic received at an ingress port.

FIG. 6 depicts an example of a load balancer that is configured todistribute data packets received from a source node amongst multipledestination nodes in accordance with a load balancing strategy.

FIG. 7A depicts an example of a network visibility appliance thatincludes a load balancer that is configured to distribute incomingtraffic amongst multiple instances of a deduplication program.

FIG. 7B depicts another example of a network visibility appliance thatincludes a load balancer configured to sort incoming data packets intobatches to be distributed amongst multiple instances of a deduplicationprogram.

FIG. 8A depicts an example of a distributed visibility fabric thatincludes multiple visibility appliances, each of which executes aninstance of a deduplication program and a load balancer (not shown).

FIG. 8B depicts another example of a distributed visibility fabric thatincludes multiple network tools.

FIG. 9 depicts a process for achieving distributed deduplication byintelligently routing traffic amongst multiple instances of adeduplication program.

FIG. 10 depicts a process for implementing a distributed deduplicationscheme.

FIG. 11 includes a block diagram illustrating an example of a processingsystem in which at least some operations described herein can beimplemented.

DETAILED DESCRIPTION

A network visibility appliance can be configured to receive data packetsfrom one or more nodes in a computer network. The network visibilityappliance may be connected to one or more network tools configured toanalyze the data packets (or copies of the data packets), monitor thetraffic within the computer network, or block the transmission ofabnormal (e.g., malicious) data packets.

Network visibility appliances have traditionally managed the bandwidthof data transfers by eliminating duplicate copies of data packets in thetraffic observed within a temporal window. This task is typicallyperformed by a computer program designed to perform a specialized datacompression technique called deduplication.

Deduplication programs serve several purposes. For example, adeduplication program can be configured to reduce the number of datapackets that are sent to a network tool by a network visibilityappliance. As another example, a deduplication program can be configuredto filter traffic to improve storage utilization. In a deduplicationprocess, the deduplication program initially identifies incoming datapackets and then stores the data packets (e.g., in cache memory). As thededuplication process continues, other incoming data packets arecompared to the stored data packets and, whenever a match occurs, theredundant data packet is filtered from the traffic. Such action ensuresthat recipients of the traffic (e.g., network tools) are not inundatedwith duplicate copies of data packets. In some instances, the redundantdata packet is replaced with a small reference that identifies thematching stored data packet.

Deduplication programs suffer from several drawbacks. With exponentialgrowth in workloads within physical data centers, many end users havebegun moving work processes and data to cloud computing platforms. Tomonitor the traffic associated with a single end user, however, anetwork visibility application may need to receive traffic from hundredsor thousands of virtual machines. Yet a single instance of adeduplication program often cannot handle the volume of traffic underconsideration. Consequently, multiple instances of the deduplicationprogram, each running in a separate network visibility appliance, areneeded.

Introduced here, therefore, are network visibility appliances capable ofimplementing a distributed deduplication scheme by routing trafficamongst multiple instances of a deduplication program. Rather thanforward all traffic associated with an end user to a single networkvisibility appliance for examination, the traffic can instead beforwarded to a pool of multiple network visibility appliances thatcollectively ensure no duplicate copies of data packets exist in thetraffic. More specifically, these network visibility appliances canroute the traffic to different instances of the deduplication program insuch a manner that duplicate copies of a data packet are guaranteed toarrive at the same instance of the deduplication program, regardless ofwhich network visibility appliance(s) initially received the duplicatecopies of the data packet.

Terminology

References in this description “an embodiment” or “one embodiment” meansthat the particular feature, function, structure, or characteristicbeing described is included in at least one embodiment. Occurrences ofsuch phrases do not necessarily refer to the same embodiment, nor arethey necessarily referring to alternative embodiments that are mutuallyexclusive of one another.

The terms “connected,” “coupled,” or any variant thereof is intended toinclude any connection or coupling between two or more elements, eitherdirect or indirect. The coupling/connection can be physical, logical, ora combination thereof. For example, devices may be electrically orcommunicatively coupled to one another despite not sharing a physicalconnection.

The sequences of steps performed in any of the processes described hereare examples. However, unless contrary to physical possibility, thesteps may be performed in various sequences and combinations. Forexample, steps could be added to, or removed from, the processesdescribed here. Similarly, steps could be replaced or reordered. Thus,descriptions of any processes are intended to be open-ended.

Network Appliance Architecture

FIG. 1A depicts an example of a network arrangement 100 a in which anetwork visibility appliance 102 receives data packets from multipledevices/applications (collectively referred to as “nodes”) in a computernetwork 110. The nodes couple an originating device 104 (e.g., a desktopcomputer system) to a recipient device 108 (e.g., a server). Thus, thenodes allow data packets to be transmitted between the originatingdevice 104 and the recipient device 108. Examples of nodes includeswitches (e.g., switches 106 a, 106 d), routers (e.g., routers 106 b,106 c), network taps, etc.

Each node represents an entry point into the computer network 110. Theentry points could be, and often are, from different points within thecomputer network 110. Generally, at least some of the nodes are operableto transmit data packets received as traffic (or duplicate copies of thedata packets) to a network visibility appliance 102 for analysis.Traffic can be directed to the network visibility appliance 102 by anode that provides an entry point into the computer network 110.

Whether a node transmits the original data packets or copies of theoriginal data packets to a device downstream of the node (e.g., thenetwork visibility appliance 102) depends on whether the downstreamdevice is an inline device or an out-of-band device. As noted above,inline devices receive the original data packets, while out-of-banddevices receive copies of the original data packets.

Here, the network visibility appliance 102 can receive data packets fromnode 106 b (e.g., via transmission path 114 a) and pass at least some ofthe data packets to node 106 c (e.g., via transmission path 114 b).Because node 106 b is able to transmit network traffic downstreamthrough the network visibility appliance 102, node 106 b need not becoupled directly to node 106 c (i.e., transmission path 114 c may notexist). Some or all of the nodes within the computer network 110 can beconfigured in a similar fashion.

When the network visibility appliance 102 is deployed as an inlinedevice, data packets are received by the network visibility appliance102 at a network port (also referred to as an “ingress port”). Forexample, data packets transmitted by node 106 b via transmission path114 a are received by the network visibility appliance 102 at aparticular ingress port. The network visibility appliance 102 mayinclude multiple ingress ports that are coupled to different nodes inthe computer network 110. The network visibility appliance 102 can be,for example, a monitoring platform that includes a chasses andinterchangeable blades offering various functionalities, such asenhanced packet distribution and masking/filtering capabilities.

The network visibility appliance 102 can also transmit data packets froma network port (also referred to as an “egress port”). For example, thenetwork visibility appliance 102 may include multiple egress ports thatare coupled to different network tools 112 a-n. Each network tool 112a-n can be deployed as an inline device or an out-of-band device at anygiven point in time. When a network tool is deployed as an out-of-banddevice, the network visibility appliance 102 creates a duplicate copy ofat least some of the data packets received by the network visibilityappliance 102, and then passes the duplicate copies to an egress portfor transmission downstream to the out-of-band network tool. When anetwork tool is deployed as an inline device, the network visibilityappliance 102 passes at least some of the original data packets to anegress port for transmission downstream to the inline network tool, andthose data packets are then normally received back from the tool at aseparate network port of the network visibility appliance 102 (i.e.,assuming the data packets are not blocked by the tool).

FIG. 1B illustrates an example path of a data packet as the data packettravels from an originating device 104 to a recipient device 108. Morespecifically, FIG. 1B depicts a network arrangement 100 b in which thenetwork visibility appliance 102 and a network tool 112 a are bothdeployed as inline devices (i.e., within the flow of network traffic).Although the transmission paths connecting the network visibilityappliance 102 and network tool 112 a are half duplex wires (i.e., onlytransmit information in one direction), full duplex wires capable oftransmitting information in both directions could also be used for someor all of the transmission paths between nodes of the computer network110.

After receiving a data packet from node 106 b, the network visibilityappliance 102 identifies a map corresponding to the data packet based onone or more characteristics of the data packet. For example, thecharacteristic(s) could include the communication protocol of which thedata packet is a part (e.g., HTTP, TCP, IP) or a session feature (e.g.,a timestamp). Additionally or alternatively, the proper map could beidentified based on the network port of the network visibility appliance102 at which the data packet was received, the source node from whichthe data packet was received, etc.

The map represents a policy for how the data packet is to be handled bythe network visibility appliance 102. For example, the map could specifythat the data packet is to be transmitted in a one-to-one configuration(i.e., from an ingress port of the network visibility appliance 102 toan egress port of the network visibility appliance 102), a one-to-manyconfiguration (i.e., from an ingress port of the network visibilityappliance 102 to multiple egress ports of the network visibilityappliance 102), or a many-to-one configuration (i.e., from multipleingress ports of the network visibility appliance 102 to an egress portof the network visibility appliance 102). Thus, a single egress port ofthe network appliance 102 could receive data packets from one or moreingress ports of the network appliance 102.

Often, the data packet is passed (e.g., by a processor of the networkvisibility appliance 102) to an egress port for transmission downstreamto a network tool (e.g., a monitoring and/or security tool). Here, forexample, the map may specify that the data packet is to be passed by thenetwork visibility appliance 102 to a tool port for transmissiondownstream to network tool 112 a. The network visibility appliance 102may aggregate or modify the data packet in accordance with the policyspecified by the map before passing the data packet to the egress portfor transmission downstream to the network tool 112 a. In someembodiments, the network visibility appliance 102 includes multipleegress ports, each of which is coupled to a different network tool oranother network visibility appliance.

After analyzing the data packet, the network tool 112 a normallytransmits the data packet back to the network visibility appliance 102(i.e., assuming the network tool 112 a does not determine that thepacket should be blocked), which passes the data packet to a networkport for transmission downstream to another node (e.g., node 106 c).

FIG. 2 depicts an example of how a visibility platform 202 that includesa network visibility appliance can be integrated into a cloud computingplatform 200 to provide a coherent view of virtualized traffic in motionacross the public cloud infrastructure for an end user. Many end users(e.g., individuals and enterprises) have begun moving work processes anddata to cloud computing platforms. By installing agents 204 on some orall of the virtual machines 206 belonging to the end user, thevisibility platform 202 can acquire data packets (or duplicate copies ofthe data packets) traversing a public cloud infrastructure for furtheranalysis in order to improve visibility into possible security risks.

In some embodiments, the visibility platform 202 is communicativelycoupled to one or more network tools 208 for analyzing the virtualizedtraffic. The network tool(s) 208 can be hosted locally as part of thevisibility platform 202 (i.e., on the cloud computing platform 200) orremotely (e.g., within an on-premises computing environment controlledby the end user). When the visibility platform 202 is entirely virtual(e.g., the network visibility appliance is comprised of a virtualprogrammable switch), the visibility platform 202 establishes a tunnelfor delivering the virtualized traffic to the network tool(s) 208regardless of where the network tool(s) 208 reside. However, when thevisibility platform 202 is physical (e.g., the network visibilityappliance is comprised of a physical programmable switch), thevisibility platform 202 may establish a tunnel only for those networktool(s) 208 that are hosted remotely (e.g., are not directly coupled tothe visibility platform 202 using physical cables).

A “tunnel” is a mechanism that can be used to reliably transmit trafficacross a network. Before virtualized traffic is forwarded to the tunnelby the visibility platform 202 for transmission to the network tool(s)208, the visibility platform 202 may create an outer jacket for thevirtualized traffic (and any other network content) based on the type oftunnel. For example, an inner payload could be wrapped in anencapsulation by the visibility platform 202 in accordance with aVirtual Extensible LAN (VXLAN) protocol or a Generic RoutingEncapsulation (GRE) protocol. The network tool(s) 208 can then removethe outer jacket upon reception and determine how the inner payload(i.e., the actual virtualized traffic) should be handled.

The visibility platform 202 can exist as a cloud-native virtual machine(also referred to as an “unnative virtual machine”) that analyzesvirtualized traffic traversing the cloud computing platform 200.Accordingly, the visibility platform 202 may not be limited by thecomputer hardware responsible for supporting the cloud computingplatform 200.

FIG. 3 depicts one embodiment of a visibility platform 300 that can berun entirely within a cloud environment or a non-cloud environment(e.g., as a virtual machine). Thus, the visibility platform 300 may behosted on a cloud computing platform, run on a dedicated piece ofcomputer hardware (e.g., a monitoring platform that includes a chassisand interchangeable blades offering various functionalities, such asenhanced packet distribution and masking/filtering capabilities), orsome combination thereof. For example, the visibility platform 300 couldinclude a network visibility appliance 304 that resides on a stand-alonepersonal computer, a dedicated network server, or some other computingdevice having an x86 instruction set architecture.

In some instances, it may be desirable to run the network visibilityappliance 304 as a virtual machine on a cloud computing platform (e.g.,cloud computing platform 200 of FIG. 2). For example, the visibilityplatform 300 may exist inside of a Virtual Private Cloud (VPC) thatresides within a dedicated section of an end user's virtual networkwithin Amazon Web Services (AWS), VMware, OpenStack, etc. Such anarrangement permits the visibility platform 300 to intelligentlyoptimize, filter, and analyze virtualized traffic across hundreds orthousands of virtual machines. Note, however, that the visibilityplatform 300 may also exist outside of the VPC.

The visibility platform 300 can include one or more agents 302 formirroring virtualized traffic traversing a cloud computing platform, anetwork visibility appliance 304 for aggregating and filtering thevirtualized traffic, one or more controllers 306, and a client 308 formanaging the visibility platform 300 as a whole. Other embodiments mayinclude a subset of these components.

As shown here, each agent 302 is fully contained within a correspondingtarget virtual machine 310 whose virtualized traffic is to be monitored.The term “virtualized traffic” generally refers to traffic thattraverses a virtual machine. While the agent(s) 302 serve requestsissued by the controller(s) 306, each agent 302 may be responsible forconfiguring its own interface mirrors, tunnels, etc.

The network visibility appliance 304 can include a programmable switch(also referred to as a “switching engine”). The programmable switch maybe a physical switch or a virtual switch, such as a software-definednetworking (SDN) switch. The network visibility appliance 304 isresponsible for aggregating virtualized traffic mirrored by the agent(s)302, and then forwarding at least some of the aggregated virtualizedtraffic to one or more network tools 312 for further analysis. In someembodiments, the network visibility appliance 304 filters (e.g., slices,masks, or samples) and/or replicates the aggregated virtualized trafficbefore forwarding it downstream to the network tool(s) 312.

The controller(s) 306, meanwhile, may be controlled by the end user viathe client 308, which may be hosted on the cloud computing platform onin an on-premises computing environment controlled by the end user. Insome embodiments a single controller 306 is configured to control theagent(s) 302 and the network visibility appliance 304, while in otherembodiments multiple controllers 306 are configured to control theagent(s) 302 and the network visibility appliance 304. Here, forexample, a first controller controls the agent(s) 302 and a secondcontroller controls the network visibility appliance 304. However, eachagent 302 could also be associated with a dedicated controller.

Together, the client 308 and the controller(s) 306 enable centralizedmanagement of the visibility platform 300 as a whole. For example, theclient 308 may be configured to integrate with one or more applicationprogramming interfaces (APIs) 314 offered by the cloud computingplatform in order to retrieve relevant information about the virtualizedtraffic being monitored (e.g., end user credentials, virtual machineaddresses, virtualized traffic characteristics). In some embodiments,the client 308 supports a drag-and-drop user interface that can be usedby the end user to create and implement traffic policies. Moreover, theclient 308 may provide traffic policy statistics to the end user or anadministrator (e.g., the manager of the visibility platform 300) fortroubleshooting in real time.

By identifying the network object(s) interconnected through a visibilityfabric, a traffic flow can be readily monitored regardless of whetherthe network visibility appliance 304 is monitoring data packetstraversing a physical device or a virtual environment. Examples ofnetwork objects include raw endpoints, tunnel endpoints, applicationendpoints, and maps. A network visibility appliance may include one ormore raw endpoints that receive traffic direction from correspondingNetwork Interface Cards (NICs) or virtual Network Interface Cards(vNICs). The network visibility appliance may also include one or moretunnel endpoints that send/receive traffic to/from remote locations.Examples of remote locations include other network visibilityappliances, on-premises computing environments, etc. Tunnel endpointscan be created by the network visibility appliance using APIs, andtunnel endpoints are typically associated with both a remote endpointand a specific type (e.g., VXLAN or GRE).

The network visibility appliance may also include one or moreapplication endpoints that send/receive packets to/from applicationprograms (also referred to as “applications”). Applications may beresponsible for creating, aggregating, filtering, and/or modifying thevirtualized traffic received by the network visibility appliance.Examples of applications can include masking programs, deep packetinspection programs, net flow generation programs, deduplicationprograms, etc.

The network visibility appliance can receive traffic at raw endpoints,tunnel endpoints, and application endpoints, and the network visibilityappliance can output traffic at tunnel endpoints and applicationendpoints. Raw endpoints, therefore, can only receive incoming traffic,while tunnel endpoints and application endpoints are generallybi-directional (i.e., can receive and transmit traffic across differentingress and egress interfaces).

Raw endpoints can receive traffic directly from (v)NICs. However, tunnelendpoints are often the predominant way to route traffic away from anetwork visibility appliance (e.g., into an on-premises environment thatincludes one or more network tools). Moreover, although applicationendpoints route virtualized traffic into an environment managed by anapplication, the environment still typically resides within the networkvisibility appliance.

Distributed Packet Deduplication by Network Visibility Appliances

Deduplication programs have traditionally been used to eliminateduplicate copies of data packets in the traffic observed within atemporal window. In a computer network, there are several differentscenarios in which duplicate copies of data packets can be generated.

First, duplicate copies of data packets may be spuriously generated byan application that resides on a network visibility appliance. Oneexample of such an application is a net flow generation program. Becausethese duplicate copies are generated on a single network visibilityappliance, a local instance of a deduplication program that resides onthe network visibility appliance can readily filter these duplicatecopies before the traffic leaves the network visibility appliance.

Second, duplicate copies of data packets may be generated by a sourcenode (e.g., a network visibility appliance) during a broadcast process.For example, if the source node intends to discover where a destinationnode is located within a computer network, the source node may transmita query message to one or more intermediate nodes (e.g., switches,routers, etc.). Each intermediate node will make a copy of the querymessage and then forward it onward to one or more other nodes. Suchaction is performed with the intention that a copy of the query messagewill eventually reach the destination node, which can then send a replyto the source node that includes a destination address. From that pointonwards, the source node and the destination node can communicate witheach other via a point-to-point communication protocol.

Duplicate copies of data packets may also be generated by a source nodeduring a multicast process. In a multicast process, the source nodetransmits a message to multiple destination nodes rather than sendingeach destination node a separate message. Broadcast processes arenormally avoided unless necessary to identify the location of adestination node, while multicast processes are often used toefficiently provide updates to multiple destination nodes.

Third, duplicate copies of data packets may be observed by a networkappliance simply because it is monitoring virtualized traffic. As shownin FIG. 4, separate instances of a deduplication program can beconfigured to monitor traffic associated with multiple virtual machines.Here, for example, Deduplication Program Instance A 406 a residing onNetwork Appliance A 404 a is configured to examine traffic that exitsVirtual Machine A 402 a, while Deduplication Program Instance B 406 bresiding on Network Appliance B 404 b is configured to examine trafficthat enters Virtual Machine B 402 b. In some embodiments, the traffic iscollected from each virtual machine by an agent that, when deployed,resides on the virtual machine. In other embodiments, the traffic iscollected from each virtual machine by some other type of flow collector408 a-b that, when deployed, resides outside of the virtual machine. Forexample, each flow collector 408 a-b may interface with the appropriatecloud computing platform to request traffic corresponding to one or morevirtual machines.

When Virtual Machine A 402 a communicates with Virtual Machine B 402 b,the same data packet will be captured twice. Deduplication ProgramInstance A 406 a will examine the data packet that is captured as itexits Virtual Machine A 402 a and Deduplication Program Instance B 406 bwill examine the data packet that is captured as it enters VirtualMachine B 402 b. However, because each instance of the deduplicationprogram only identities duplicate copies of data packets within thetraffic received by the corresponding network appliance, neitherDeduplication Program Instance A 406 a nor Deduplication ProgramInstance B 406 b will eliminate the data packet involved in thecommunication. If Network Appliance A 404 a and Network Appliance B 404b are configured to forward filtered traffic onward to a network tool410, the network tool 410 will receive duplicate copies of the datapacket.

Introduced here, therefore, are techniques for achieving distributeddeduplication by intelligently routing traffic amongst multipleinstances of a deduplication program. Each instance of the deduplicationprogram may reside on a different network visibility appliance.Together, the multiple network visibility appliances on which themultiple instances of the deduplication program reside form a pool ofnetwork visibility appliances capable of implementing a distributeddeduplication scheme. These network visibility appliances can routetraffic amongst the multiple instances of the deduplication program insuch a manner that duplicate copies of data packet are guaranteed toarrive at the same instance of the deduplication program, regardless ofwhich network visibility appliance(s) initially received the duplicatecopies of the data packet.

FIG. 5 depicts an example of a network visibility appliance 500 thatincludes a deduplication program 502 capable of filtering duplicatecopies of data packets from traffic received at an ingress port 504.Generally, the deduplication program 502 filters traffic to ensure thatduplicate copies of data packets are not forwarded downstream to anetwork tool via an egress port 506 (also referred to as a “tool port”).

In a deduplication process, the deduplication program 502 initiallyidentifies data packets received at the ingress port 504 and then storesthe data packets (e.g., in memory 508) during an identification stage.Alternatively, the deduplication program 502 may populate a datastructure in the memory 508 with information regarding the data packetsreceived at the ingress port 504. For example, the data structure mayinclude a separate record for each received data packet that specifiesone or more characteristics (e.g., source, packet length, destination,protocol). As the deduplication process continues, the deduplicationprogram 502 compares other data packets received at the ingress port 504to the data packets stored in the memory 508 or the data structure.Whenever a match occurs, the redundant data packet is filtered from thetraffic before the traffic is forwarded downstream via the egress port506. Such action ensures that a recipient (e.g., a network tool) is notinundated with duplicate copies of data packets. In some embodiments,the redundant data packet is replaced with a reference that identifiesthe matching stored data packet.

In some embodiments, the deduplication program 502 compares an entirereceived data packet to the data packets stored in the memory 508. Insuch embodiments, the deduplication program 502 may determine that thereceived data packet is a duplicate copy only if it is a complete matchwith a stored data packet. In other embodiments, the deduplicationprogram 502 compares certain field(s) of a received data packet tocorresponding field(s) of the stored data packets. This technique (alsoreferred to as the “field matching technique”) may be used in networkingsituations to reduce latency caused by filtering. Said another way, thefield matching technique is often employed in networking situationsbecause the network visibility appliance 500 must forward the trafficreceived at the ingress port 504 within a specified timeframe.

Moreover, data packets received by the network visibility appliance 500at the ingress port 504 can come in a variety of sizes. For example,data packets can range from 64 bytes to over 9,000 bytes. When thededuplication program 502 is executed by a physical programmable switch,these large data packets can be handled without issue. However, when thededuplication program 502 is executed by a virtual programmable switch,these large data packets cannot be handled without resulting inundesirable latency. Therefore, the field matching technique may beemployed by virtual programmable switches to squash duplicate copies ofdata packets with high confidence without examining the entire payload.

As noted above, the deduplication program 502 will only compare incomingdata packets to those data packets stored in the memory 508 of thenetwork visibility appliance 500. However, many end users have asufficiently large volume of traffic that multiple network visibilityappliances, each running a separate instance of the deduplicationprogram, must be used to monitor the traffic. In a distributedenvironment of multiple network visibility appliances, it is importantthat all potential duplicate copies of a data packet be examined by thesame instance of the deduplication program. Load balancing mechanisms(also referred to as “load balancers”) may be used to ensure that thetraffic received at a given network visibility appliances is properlydistributed amongst the multiple network visibility appliances.

FIG. 6 depicts an example of a load balancer 600 that is configured todistribute data packets received from a source node 602 amongst multipledestination nodes 604 a-n in accordance with a load balancing strategy.The source node 602 may be an agent deployed on a virtual machine, aflow collector deployed outside of a virtual machine, a cloud computingplatform, etc. The destination nodes 604 a-n, meanwhile, may be networkvisibility appliances having separate instances of a deduplicationprogram. Thus, the load balancer 600 can ensure that traffic received bya pool of multiple network appliances is distributed amongst themultiple network appliances in a roughly equivalent manner.

The load balancer 600 examines incoming traffic to determine whichdestination node of the multiple destination nodes 604 a-n each datapacket should be forwarded to. To properly balance the incoming trafficacross the multiple destination nodes 604-a, the load balancingmechanism 600 can apply a transformation function that creates a valuefor each data packet and then identify the appropriate destination nodefor each data packet based on the corresponding value. One example of atransformation function is the highest random weight (HRW) hashingalgorithm (also referred to as the “rendezvous hashing algorithm”). TheHRW hashing algorithm is designed to achieve distributed agreement on aset of k options out of a possible set of n options.

When executed by the load balancer 600, the HRW hashing algorithm willassign each destination node (V_(Dj)) a weight for each data packet inthe incoming traffic, and then forward each data packet to thedestination node having the largest weight. As further described below,multiple load balancers can be used to ensure that duplicate copies ofdata packets are forwarded to the same destination node. Properdistribution, however, requires that each load balancer execute the sametransformation function. For example, each load balancer involved in adistributed deduplication scheme may apply an identical hash function.When a transformation function is agreed upon by all load balancers in avisibility fabric, each load balancer can independently route trafficbased on values computed using the transformation function. For example,each load balancer may independently compute weights using the HRWhashing algorithm and then pick whichever destination node correspondsto the largest weight.

FIG. 7A depicts an example of a network visibility appliance 700 a thatincludes a load balancer 704 a that is configured to distribute incomingtraffic amongst multiple instances of a deduplication program. Afterreceiving data packets at an ingress port 702, the network visibilityappliance 700 a can split the data packets into multiple batches usingthe load balancer 704 a. For example, the load balancer 704 a may applya transformation function that causes a value to be generated for eachdata packet, and then separate the data packets into batches based onthese values. The value assigned to each data packet may be based ondata packet characteristics, such as the communication protocol of whichthe data packet is a part (e.g., HTTP, TCP, UDP, IPv4, IPv6), a sequencenumber, a session feature (e.g., a timestamp), the ingress port at whichthe data packet was received, a source address, a destination address,header length, payload length, etc. Additionally or alternatively, thevalue assigned to each data packet may be based on the content of acertain field included in, for example, the header.

Here, the load balancer 704 a is configured to split the data packetsinto three separate batches. Data packets having a first value (or avalue within a first set of values) will be filtered into a first batch,data packets having a second value (or a value within a second set ofvalues) will be filtered into a second batch, and data packets having athird value (or a value within a third set of values) will be filteredinto a third batch. The load balancer 704 a may also be able to access adata structure that specifies how each batch of data packets should behandled. Here, the third batch of data packets is forwarded to adeduplication program 706 a for examination. Data packets in the thirdbatch that survive examination by the deduplication program 706 a can beforwarded to a third egress port 712 for transmission downstream to anetwork tool. Meanwhile, the first batch of data packets and the secondbatch of data packets are forwarded to a first egress port 708 and asecond egress port 710, respectively, for transmission downstream todifferent network visibility appliances. This may be done so that thefirst batch of data packets and the second batch of data packets can beexamined by other instances of the deduplication program that reside onother network visibility appliances. For example, transmission of thefirst batch of data packets to Network Visibility Appliance A may causethe first batch of data packets to be examined by an instance of thededuplication program that resides on Network Visibility Appliance A.Similarly, transmission of the second batch of data packets to NetworkVisibility Appliance B may cause the second batch of data packets to beexamined by an instance of the deduplication that resides on NetworkVisibility Appliance B.

In some embodiments, the load balancer 704 a has access to a datastructure that maps values amongst multiple network visibilityappliances or multiple instances of the deduplication program. Eachvalue may be mapped to a single network visibility appliance or singleinstance of the deduplication program. Accordingly, to determine whichbatch a given data packet belongs to, the load balancer 704 a can accessthe data structure to determine which network visibility appliance orinstance of the deduplication program is specified by an entrycorresponding to the value created for the given data packet. As furtherdescribed below, the data structure may be dynamically edited responsiveto detecting a change in the status of a network visibility appliance.Accordingly, if an existing network visibility appliance becomesinaccessible, all entries in the data structure corresponding to theexisting network appliance can be remapped to different networkvisibility appliance(s). Similarly, if a new network visibilityappliance becomes accessible, one or more entries in the data structurecorresponding to existing network visibility appliance(s) can beremapped to the new network visibility appliance. Generally, the loadbalancer 704 a is completely client-based. Thus, the load balancer 704 amay be able to fully function without communicating with either thenetwork visibility appliance(s) to which it may transmit traffic or thevirtual machine(s) from which it may receive traffic.

FIG. 7B depicts another example of a network visibility appliance 700 bthat includes a load balancer 704 b configured to sort incoming datapackets into batches to be distributed amongst multiple instances of adeduplication program. For data packets received at ingress port 702,the load balancer 704 b of FIG. 7B may operate the same as the loadbalancer 704 a of FIG. 7A. Thus, a first batch of data packets and asecond batch of data packets may be forwarded to a first egress port 708and a second egress port 710, respectively, for transmission downstreamto different network appliances, while a third batch of data packets maybe forwarded to a deduplication program 706 b for examination. Here,however, the network visibility appliance 700 b also receives datapackets at a second ingress port 714 and a third ingress port 716. Thesedata packets may have been forwarded to the network visibility appliance700 b by the other network visibility appliances that are connected tothe first egress port 708 and the second egress port 710.

Generally, the data packets received at the second ingress port 714 andthe third ingress port 716 correspond to batches created by the loadbalancers residing on each of these other network appliances. Forexample, a load balancer residing on Network Visibility Appliance A mayhave created a batch of data packets that is subsequently received bythe network visibility appliance 700 b at the second ingress port 714.Similarly, a load balancer residing on Network Visibility Appliance Bmay have created a batch of data packets that is subsequently receivedby the network visibility appliance 700 b at the third ingress port 716.Rather than be directed to the load balancer 704 b, these data packetsmay be forwarded directly to the deduplication program 706 b forexamination. Such action may occur if other load balancers (e.g., thoseresiding on Network Visibility Appliance A and Network VisibilityAppliance B) have determined that these data packets should be examinedby the load balancer 704 b. Note, however, that these data packets couldinstead forwarded to the load balancer 704 b. Because the load balancer704 b applies the same transformation function as the other loadbalancers, all of the data packets received at the second ingress port714 and the third ingress port 716 will be sorted into the third batchthat is forwarded to the deduplication program 706 b for examination.

FIG. 8A depicts an example of a distributed visibility fabric 800 a thatincludes multiple visibility appliances 802 a-c, each of which executesan instance of a deduplication program 804 a-c and a load balancer (notshown). FIG. 8B depicts another example of a distributed visibilityfabric 800 b that includes multiple network tools 806 a-c. By working inconcert with one another, the multiple visibility appliances 802 a-c canensure that potential duplicate copies of a data packet will be examinedby the same instance of the deduplication program.

Each network visibility appliance can receive traffic at a network port.Here, for example, network visibility appliance 802 a receivesvirtualized traffic corresponding to a series of virtual machines (i.e.,VM_(A1), VM_(A2), . . . VM_(AK)) at a first network port (N₁). The firstnetwork port may also be referred to as an “ingress port.” Uponreceiving the traffic, a load balancer can sort the data packets intoone or more batches as shown in FIGS. 7A-B. Here, the load balancer hassorted the data packets into three separate batches of data packets. Afirst batch of data packets can be forwarded to a local deduplicationprogram 804 a for examination. As shown in FIG. 8A. data packets in thefirst batch that survive examination by the local deduplication program804 a can be forwarded to a second network port (N₂) for transmissiondownstream to a network tool 806. The second network port may also bereferred to as a “tool port.” As shown in FIG. 8B, data packets in thefirst batch that survive examination by the local deduplication program804 a could also be forwarded to multiple tool ports. For example, thenetwork visibility appliance 802 a may apply additional filter(s) to thesurviving data packets in the first batch to determine whether certainsubsets of these data packets should be dropped, modified, forwarded toa certain type of network tool, etc.

Meanwhile, a second batch of data packets and a third batch of datapackets can be forwarded to different network ports for transmissiondownstream. Here, for example, the second batch of data packets isforwarded to a third network port (N₃) for transmission to networkvisibility appliance 802 b and the third batch of data packets isforwarded to a fourth network port (N₄) for transmission to networkvisibility appliance 802 c. The third and fourth network ports may alsobe referred to as “egress ports.”

Each network visibility appliance will typically operate in asubstantially similar manner. Thus, each network visibility appliancemay use a load balancer to sort incoming data packets into batches,identify at least one batch to be forwarded to a local deduplicationprogram for examination, identify at least one batch to be forwarded toanother network visibility appliance for examination by a remotededuplication program, etc. However, if each load balancer is configuredto apply the same transformation function, then each instance of thededuplication program will examine different subsets of traffic. Thisensures that data packets will be forwarded in such a manner thatduplicate copies of a data packet are guaranteed to arrive at the sameinstance of the deduplication program, regardless of which networkvisibility appliance(s) initially received the duplicate copies of thedata packet.

For example, network visibility appliance 802 a may receive traffic thatis sorted into three separate batches of data packets based on the valueassigned to each data packet by a first load balancer. The first loadbalancer may determine that a first batch of data packets should beforwarded to deduplication program 804 a for examination. The firstbatch of data packets may include all data packets in the traffic thathave a certain characteristic. Meanwhile, network visibility appliance802 b may receive traffic that is also sorted into three separatebatches of data packets based on the value assigned to each data packetby a second load balancer. The second load balancer may determine that asecond batch of data packets should be examined by deduplication program804 a because these data packets share the certain characteristic incommon with the first batch of data packets. Thus, the load balancerresiding on network visibility appliance 802 b may cause the secondbatch of data packets to be forwarded to a network port (e.g., N₃) fortransmission to network visibility appliance 802 a. Such action can becarried out across the multiple network visibility appliances 802 a-c toensure that duplicate copies of a data packet will be examined by thesame instance of the deduplication program.

FIG. 9 depicts a process 900 for achieving distributed deduplication byintelligently routing traffic amongst multiple instances of adeduplication program. Initially, traffic is received at an ingress portof a network visibility appliance (step 901). The traffic may include,for example, virtualized traffic associated with one or more virtualmachines.

The network visibility appliance can then prompt a load balancer toapply a transformation function to generate a value for each data packet(step 902), and then access a data structure that maps the valuesamongst multiple instances of a deduplication program or multiplenetwork visibility appliances (step 903). Generally, each value ismapped to only a single instance of the deduplication program.Accordingly, when the load balancer accesses the data structure, theload balancer will be able to identify a single destination for a givendata packet. In some embodiments, the load balancer separates thetraffic into multiple batches of data packets based on these values(step 904). For example, the load balancer may create a first batch thatincludes all data packets corresponding to entries in the data structurethat specify a first instance of the deduplication program, a secondbatch that includes all data packets corresponding to entries in thedata structure that specify a second instance of the deduplicationprogram, etc.

The load balancer can forward at least one batch of data packets to alocal instance of the deduplication program for examination (step 905).Data packets in the at least one batch that survive examination by thelocal instance of the deduplication program may be forwarded to a toolport for transmission to a network tool. The load balancer can alsoforward at least one other batch of data packets to an egress port fortransmission to a second network visibility appliance (step 906). Suchaction may occur if the load balancer determines (e.g., by examining thedata structure) that the at least one other batch is to be examined by aremote instance of the deduplication program that resides on the secondnetwork visibility appliance.

In some embodiments, the network visibility appliance can be configuredto dynamically modify the data structure to alter traffic distributionpatterns as existing network visibility appliances become unavailable,new network visibility appliances become available, etc. For example,the network visibility appliance may receive an indication that thesecond network visibility appliance is not presently accessible (step907). In such embodiments, the network visibility appliance may modifyentries in the data structure that correspond to the at least one otherbatch of data packets to indicate a third instance of the deduplicationprogram or a third network visibility appliance (step 908). Modifyingthe entries will cause the load balancer to forward the at least oneother batch of data packets to another egress port for transmission tothe third network visibility appliance.

Moreover, the network visibility appliance may be configured to receivea batch of data packets at another ingress port. As shown in FIGS. 8A-B,the batch of data packets may be transmitted by another networkvisibility appliance (e.g., the second network visibility appliance orthe third network visibility appliance) responsive to a determinationthat the batch of data packets is to be examined by the local instanceof the deduplication program. In such embodiments, the batch of datapackets can be forwarded to the local instance of the deduplicationprogram for examination.

FIG. 10 depicts a process 1000 for implementing a distributeddeduplication scheme. While the steps of process 1000 may be describedas being performed by a controller configured to manage multiple networkvisibility appliances, those skilled in the art will recognize that thesteps could also be performed by one of the network visibilityappliances.

Initially, a controller identifies multiple network visibilityappliances to be included in a distributed deduplication scheme (step1001). In some embodiments, each network visibility appliance of themultiple network visibility appliances is associated with the same enduser (e.g., individual or enterprise). In other embodiments, the propernumber of network visibility appliances is determined based on thevolume of traffic expected to be examined. Traffic volume may beestimated based on historical volumes, the number of virtual machines tobe monitored, etc.

The controller can then instantiate a separate load balancer on eachnetwork visibility appliance (step 1002), as well as instantiate aseparate instance of a deduplication program on each network visibilityappliance (step 1003). As described above, the multiple instances of thededuplication program may be used to filter volumes of traffic thatcouldn't be handled by a single deduplication program.

The controller can also establish a communication channel between eachnetwork visibility appliance (step 1004). To facilitate the creation ofeach communication channel, the controller may configure an ordered listof network ports for each load balancer as shown in Table I.

TABLE I Ordered list of network ports for each load balancerinstantiated on a pool of n network visibility appliances, where D_(i)is the deduplication program instance on network visibility appliance iand V_(i) is a tunnel connection to network visibility appliance i.Network Visibility Appliance Ordered List of Network Ports 1 [D₁, V₂, .. . V_(n−1), V_(n)] 2 [V₁, D₂, . . . V_(n−1), V_(n)] . . . . . . i [V₁,V₂, . . . , V_(i−1), D_(i), V_(i+1), . . . , V_(n−1), V_(n)] . . . . . .n − 1 [V₁, V₂, . . . D_(n−1), V_(n)] n [V₁, V₂, . . . V_(n−1), D_(n)]

Thus, each network visibility appliance will include a network portcorresponding to each other network visibility appliance of the multiplenetwork visibility appliances. In some embodiments the network port isbidirectional (i.e., can transmit and receive data packets), while inother embodiments the network port is unidirectional (i.e., can onlytransmit or receive data packets). If the network port isunidirectional, each communication channel may correspond to a pair ofnetwork ports (e.g., an ingress port through which to receive datapackets and an egress port through which to transmit data packets).

Each communication channel may be established via a tunnel between thecorresponding network visibility appliances. As noted above, a “tunnel”is a mechanism that can be used to reliably transmit traffic across anetwork. Accordingly, traffic may be transmitted between pairs ofnetwork visibility appliances that each include a tunnel endpoint. Tonumber of tunnels required to create a fully connected mesh between nnetwork visibility appliances is given by:

$C = {\frac{n\left( {n - 1} \right)}{2}.}$

Furthermore, each network visibility appliance included in the fullyconnected mesh will include n−1 tunnel endpoints (i.e., a tunnelendpoint for each remote instance of the deduplication program). WhileFIGS. 8A-B include 3 network visibility appliances, a visibility fabriccould include any number of network visibility appliances. For example,a visibility fabric that includes 32 network visibility appliances(i.e., n=32) and has endpoint-to-network visibility appliance mappingratios of 8:1-32:1 can readily support distributed deduplication across256-1,024 different endpoints (e.g., virtual machines). Largerconfigurations (i.e., n>32) are also possible, though these situationsmay employ a multi-level hierarchy of network visibility appliances tocascade traffic across multiple hierarchical levels.

The controller can then program the separate load balancers to apply anidentical transformation function to incoming data packets (step 1005).For example, each load balancer may be programed to apply the same hashfunction. When a transformation function is agreed upon by all loadbalancers in a visibility fabric, each load balancer can independentlyroute traffic based on values computed using the transformationfunction. For example, each load balancer may independently computeweights using the HRW hashing algorithm and then pick whicheverdestination node corresponds to the largest weight.

These steps may be performed in various sequences. For example, eachload balancer could be programmed to apply an identical transformationfunction before being instantiated on a corresponding network visibilityappliance. As another example, a separate instance of the deduplicationprogram could be instantiated on each network visibility appliancebefore a separate load balancer is instantiated on each networkvisibility appliance.

Processing System

FIG. 11 includes a block diagram illustrating an example of a processingsystem 1100 in which at least some operations described herein can beimplemented. For example, the processing system 1100 may be responsiblefor generating an interface through which an end user manages multiplenetwork visibility appliances involved in a distributed deduplicationscheme. As another example, at least a portion of the processing system1100 may be included in a computing device (e.g., a server) thatsupports a network visibility appliance and/or a cloud computingplatform. The process system 1100 may include one or more processors1102, main memory 1106, non-volatile memory 1110, network adapter 1112(e.g., network interfaces), display 1118, input/output devices 1120,control device 1122 (e.g., keyboard and pointing devices), drive unit1124 including a storage medium 1126, and signal generation device 1130that are communicatively connected to a bus 1116. The bus 1116 isillustrated as an abstraction that represents any one or more separatephysical buses, point to point connections, or both connected byappropriate bridges, adapters, or controllers. The bus 1116, therefore,can include, for example, a system bus, a Peripheral ComponentInterconnect (PCI) bus or PCI-Express bus, a HyperTransport or industrystandard architecture (ISA) bus, a small computer system interface(SCSI) bus, a universal serial bus (USB), IIC (I2C) bus, or an Instituteof Electrical and Electronics Engineers (IEEE) standard 1394 bus, alsocalled “Firewire.” A bus may also be responsible for relaying datapackets (e.g., via full or half duplex wires) between components of anetwork appliance, such as a switching engine, network port(s), toolport(s), etc.

In various embodiments, the processing system 1100 operates as astandalone device, although the processing system 1100 may be connected(e.g., wired or wirelessly) to other devices. For example, theprocessing system 1100 may include a terminal that is coupled directlyto a network appliance. As another example, the processing system 1100may be wirelessly coupled to the network appliance.

In various embodiments, the processing system 1100 may be a servercomputer, a client computer, a personal computer (PC), a user device, atablet PC, a laptop computer, a personal digital assistant (PDA), acellular telephone, an iPhone, an iPad, a Blackberry, a processor, atelephone, a web appliance, a network router, switch or bridge, aconsole, a hand-held console, a (hand-held) gaming device, a musicplayer, any portable, mobile, hand-held device, or any machine capableof executing a set of instructions (sequential or otherwise) thatspecify actions to be taken by the processing system 1100.

While the main memory 1106, non-volatile memory 1110, and storage medium1126 (also called a “machine-readable medium) are shown to be a singlemedium, the term “machine-readable medium” and “storage medium” shouldbe taken to include a single medium or multiple media (e.g., acentralized or distributed database, and/or associated caches andservers) that store one or more sets of instructions 1128. The term“machine-readable medium” and “storage medium” shall also be taken toinclude any medium that is capable of storing, encoding, or carrying aset of instructions for execution by the processing system 1100 and thatcause the processing system 1100 to perform any one or more of themethodologies of the presently disclosed embodiments.

In general, the routines that are executed to implement the technologymay be implemented as part of an operating system or a specificapplication, component, program, object, module, or sequence ofinstructions (collectively referred to as “computer programs”). Thecomputer programs typically comprise one or more instructions (e.g.,instructions 1104, 1108, 1128) set at various times in various memoryand storage devices in a computer, and that, when read and executed byone or more processing units or processors 1102, cause the processingsystem 1100 to perform operations to execute elements involving thevarious aspects of the disclosure.

Moreover, while embodiments have been described in the context of fullyfunctioning computers and computer systems, those skilled in the artwill appreciate that the various embodiments are capable of beingdistributed as a program product in a variety of forms, and that thedisclosure applies equally regardless of the particular type of machineor computer-readable media used to actually effect the distribution.

Further examples of machine-readable storage media, machine-readablemedia, or computer-readable (storage) media include recordable typemedia such as volatile and non-volatile memory devices 1110, floppy andother removable disks, hard disk drives, optical disks (e.g., CompactDisk Read-Only Memory (CD ROMS), Digital Versatile Disks (DVDs)), andtransmission type media such as digital and analog communication links.

The network adapter 1112 enables the processing system 1100 to mediatedata in a network 1114 with an entity that is external to the processingsystem 1100, such as a network appliance, through any known and/orconvenient communications protocol supported by the processing system1100 and the external entity. The network adapter 1112 can include oneor more of a network adaptor card, a wireless network interface card, arouter, an access point, a wireless router, a switch, a multilayerswitch, a protocol converter, a gateway, a bridge, bridge router, a hub,a digital media receiver, and/or a repeater.

The network adapter 1112 can include a firewall which can, in someembodiments, govern and/or manage permission to access/proxy data in acomputer network, and track varying levels of trust between differentmachines and/or applications. The firewall can be any number of moduleshaving any combination of hardware and/or software components able toenforce a predetermined set of access rights between a particular set ofmachines and applications, machines and machines, and/or applicationsand applications, for example, to regulate the flow of traffic andresource sharing between these varying entities. The firewall mayadditionally manage and/or have access to an access control list whichdetails permissions including for example, the access and operationrights of an object by an individual, a machine, and/or an application,and the circumstances under which the permission rights stand.

Other network security functions can be performed or included in thefunctions of the firewall, including intrusion prevention, intrusiondetection, next-generation firewall, personal firewall, etc.

As indicated above, the techniques introduced here implemented by, forexample, programmable circuitry (e.g., one or more microprocessors),programmed with software and/or firmware, entirely in special-purposehardwired (i.e., non-programmable) circuitry, or in a combination orsuch forms. Special-purpose circuitry can be in the form of, forexample, one or more application-specific integrated circuits (ASICs),programmable logic devices (PLDs), field-programmable gate arrays(FPGAs), etc.

Note that any of the embodiments described above can be combined withanother embodiment, except to the extent that it may be stated otherwiseabove or to the extent that any such embodiments might be mutuallyexclusive in function and/or structure.

Although the present invention has been described with reference tospecific exemplary embodiments, it will be recognized that the inventionis not limited to the embodiments described, but can be practiced withmodification and alteration within the spirit and scope of the appendedclaims. Accordingly, the specification and drawings are to be regardedin an illustrative sense rather than a restrictive sense.

What is claimed is:
 1. A computer-implemented method comprising:receiving virtualized traffic at a first ingress port of a first networkvisibility appliance; separating the virtualized traffic into a firstbatch of data packets that share a first characteristic in common, thefirst batch to be examined by a first instance of a deduplicationprogram that executes in the first network visibility appliance, and asecond batch of data packets that share a second characteristic incommon, the second batch to be examined by a second instance of thededuplication program that executes in a second network visibilityappliance; receiving a third batch of data packets at a second ingressport of the first network visibility appliance; determining that alldata packets in the third batch share the first characteristic incommon; forwarding all data packets of the first batch and all datapackets of the third batch to the first instance of the deduplicationfor examination; and forwarding all data packets of the second batch toan egress port for transmission to the second network visibilityappliance.
 2. The computer-implemented method of claim 1, wherein thevirtualized traffic is associated with a first virtual machine, andwherein the third batch of data packets is included in virtualizedtraffic associated with a second virtual machine.
 3. Thecomputer-implemented method of claim 1, wherein said separatingcomprises: generating a hash value for each data packet included in thevirtualized traffic, thereby producing a plurality of hash values;accessing a data structure that maps the plurality of hash valuesamongst a plurality of instances of the deduplication program, whereineach hash value is mapped to only a single instance of the deduplicationprogram, and wherein each instance of the deduplication program executesin a different network visibility appliance; determining that each datapacket in the first batch of data packets corresponds to an entry in thedata structure that specifies the first instance of the deduplicationprogram; and determining that each data packet in the second batch ofdata packets corresponds to an entry in the data structure thatspecifies the second instance of the deduplication program.
 4. Thecomputer-implemented method of claim 3, further comprising: receiving anindication that the second network visibility appliance is not presentlyaccessible; and modifying entries in the data structure that correspondto the second batch of data packets to indicate a third instance of thededuplication program, wherein the third instance of the deduplicationprogram resides on a third network visibility appliance, and whereinsaid modifying causes the second batch of data packets to be forwardedto a second egress port for transmission to the third network visibilityappliance.
 5. A computer-implemented method comprising: applying, by anetwork appliance, a specified transformation function to generate avalue for each of a plurality of data packets received at a firstingress port of the network appliance, to produce a plurality of values;using, by the network appliance, the plurality of values to identify afirst batch of data packets to be examined by a local instance of adeduplication program that executes in the network appliance, and asecond batch of data packets to be examined by a remote instance of thededuplication program that executes in another network appliance; andforwarding, by the network appliance, the second batch of data packetsto an egress port for transmission to the other network appliance. 6.The computer-implemented method of claim 5, further comprising: causing,by the network appliance, the first batch of data packets to be examinedby the local instance of the deduplication program; and forwarding, bythe network appliance, at least a portion of the first batch of datapackets to a tool port for transmission to a network tool.
 7. Thecomputer-implemented method of claim 6, further comprising: receiving,by the network appliance, a third batch of data packets from the othernetwork appliance at a second ingress port; causing, by the networkappliance, the third batch of data packets to be examined by the localinstance of the deduplication program; and forwarding, by the networkappliance, at least a portion of the third batch of data packets to thetool port for transmission to the network tool.
 8. Thecomputer-implemented method of claim 7, wherein the second ingress portcorresponds to one end of a tunnel connected between the networkappliance and the other network appliance.
 9. The computer-implementedmethod of claim 5, wherein using the plurality of values to identify thefirst and second batches of data packets comprises: accessing, by thenetwork appliance, a data structure that maps the plurality of valuesamongst a plurality of instances of the deduplication program, whereineach value is mapped to only a single instance of the deduplicationprogram, and wherein each instance of the deduplication program executesin a different network appliance; determining, by the network appliance,that each data packet in the first batch of data packets corresponds toan entry that specifies the local instance of the deduplication program;and determining, by the network appliance, that each data packet in thesecond batch of data packets corresponds to an entry that specifies theremote instance of the deduplication program.
 10. Thecomputer-implemented method of claim 5, wherein each value is based on afield in a header of the plurality of data packets.
 11. Thecomputer-implemented method of claim 5, wherein each value is based onat least one of: Transmission Control Protocol (TCP) sequence number,header length, payload length, type of service, protocol, sourceaddress, or destination address.
 12. A computer-implemented methodcomprising: identifying a plurality of network visibility appliances tobe included in a distributed packet deduplication scheme; instantiatinga plurality of load balancers associated with the plurality of networkvisibility appliances; instantiating a plurality of instances of adeduplication program on the plurality of network visibility appliances;and establishing a communication channel between a first networkvisibility appliance of the plurality of network visibility appliancesand a second network visibility appliance of the plurality of networkvisibility appliances.
 13. The computer-implemented method of claim 12,wherein each instance of the plurality of instances of the deduplicationprogram and each load balancer of the plurality of load balancersexecutes in a different network appliance of the plurality of networkvisibility appliances.
 14. The computer-implemented method of claim 12,further comprising: configuring the plurality of load balancers to applyan identical transformation function to incoming data packets to theplurality of network visibility appliances.
 15. The computer-implementedmethod of claim 12, further comprising: determining a number of networkvisibility appliances based on an expected volume of traffic, whereinthe plurality of network visibility appliances to be included in thedistributed deduplication scheme include the determined number ofnetwork visibility appliances.
 16. The computer-implemented method ofclaim 15, wherein the expected volume of traffic is estimated based on ahistorical volume or a number of virtual machines to be monitored. 17.The computer-implemented method of claim 12, wherein establishing thecommunication channel includes: for each network visibility appliance ofthe plurality of network visibility appliances, configuring acorresponding list of network ports, wherein each entry in a list ofnetwork ports indicates (1) an instance of the plurality of instances ofthe deduplication program executing on the network visibility appliance,or (2) a tunnel to another network visibility appliance.
 18. Thecomputer-implemented method of claim 12, wherein a load balancerassociated with a first network visibility appliance of the plurality ofnetwork visibility appliances is configured to: generate a value foreach data packet included in virtualized traffic received at an ingressport of the first network visibility appliance, thereby producing aplurality of values; identify a first value corresponding to a firstdata packet included in the virtualized traffic; access a data structurethat includes an entry for each value, wherein each entry includes arouting instruction that specifies which instance of the plurality ofinstances of the deduplication program is responsible for examining thecorresponding data packet; determine, based on the first value, that thefirst data packet is to be examined by a remote instance of thededuplication program that executes in a second network visibilityappliance; and forward the first data packet to an egress port fortransmission to the second network visibility appliance.
 19. Thecomputer-implemented method of claim 18, wherein the load balancer isfurther configured to: identify a second value corresponding to a seconddata packet included in the virtualized traffic; determine, based on thesecond value, that the second data packet is to be examined by a localinstance of the deduplication program that executes in the first networkvisibility appliance; and cause the second data packet to be examined bythe local instance of the deduplication program.
 20. Thecomputer-implemented method of claim 12, wherein the plurality ofnetwork visibility appliances include a tool port through which to routetraffic to a network tool.